Decision Tree Induction

نویسندگان

  • Roberta Siciliano
  • Claudio Conversano
چکیده

Decision Tree Induction (DTI) is an important step of the segmentation methodology. It can be viewed as a tool for the analysis of large datasets characterized by high dimensionality and nonstandard structure. Segmentation follows a nonparametric approach, since no hypotheses are made on the variable distribution. The resulting model has the structure of a tree graph. It is considered a supervised method, since a response criterion variable is explained by a set of predictors. In particular, segmentation consists of partitioning the objects (also called cases, individuals, observations, etc.) into a number of subgroups (on the basis of suitable partitioning of the modalities of the explanatory variables, the so-called predictors) in a recursive way, so that a tree-structure is produced. Typically, partitioning is in two subgroups yielding to binary trees, although ternary trees as well as r-way trees also can be built up. Two main targets can be achieved with tree-structures—classification and regression trees—on the basis of the type of response variable, which can be categorical or numerical. Tree-based methods are characterized by two main tasks: exploratory and decision. The first is to describe with the tree structure the dependence between the response and the predictors. The decision task is properly of DTI, aiming to define a decision rule for unseen objects for estimating unknown response class/values as well as validating the accuracy of the final results. For example, trees often are considered in creditscoring problems in order to describe and classify good and bad clients of a bank on the basis of socioeconomic indicators (e.g., age, working conditions, family status, etc.) and financial conditions (e.g., income, savings, payment methods, etc.). Conditional interactions describing the client profile can be detected looking at the paths along the tree, when going from the top to the terminal nodes. Each internal node of the tree is assigned a partition (or a split for binary tree) of the predictor space, and each terminal node is assigned a label class/value of the response. As a result, each tree path, characterized by a sequence of predictor interactions, can be viewed as a production rule yielding to a specific label class/value. The set of production rules constitutes the predictive learning of the response class/ value of new objects, where only measurements of the predictors are known. As an example, a new client of a bank is classified as a good client or a bad one by dropping it down the tree according to the set of splits (binary questions) of a tree path, until a terminal node labeled by a specific response-class is reached.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing different stopping criteria for fuzzy decision tree induction through IDFID3

Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...

متن کامل

DIAGNOSIS OF BREAST LESIONS USING THE LOCAL CHAN-VESE MODEL, HIERARCHICAL FUZZY PARTITIONING AND FUZZY DECISION TREE INDUCTION

Breast cancer is one of the leading causes of death among women. Mammography remains today the best technology to detect breast cancer, early and efficiently, to distinguish between benign and malignant diseases. Several techniques in image processing and analysis have been developed to address this problem. In this paper, we propose a new solution to the problem of computer aided detection and...

متن کامل

A New Acceptance Sampling Design Using Bayesian Modeling and Backwards Induction

In acceptance sampling plans, the decisions on either accepting or rejecting a specific batch is still a challenging problem. In order to provide a desired level of protection for customers as well as manufacturers, in this paper, a new acceptance sampling design is proposed to accept or reject a batch based on Bayesian modeling to update the distribution function of the percentage of nonconfor...

متن کامل

Multi-objective Optimization for Incremental Decision Tree Learning

Decision tree learning can be roughly classified into two categories: static and incremental inductions. Static tree induction applies greedy search in splitting test for obtaining a global optimal model. Incremental tree induction constructs a decision model by analyzing data in short segments; during each segment a local optimal tree structure is formed. Very Fast Decision Tree [4] is a typic...

متن کامل

A decision-tree-based symbolic rule induction system for text categorization

We present a decision-tree-based symbolic rule induction system for categorizing text documents automatically. Our method for rule induction involves the novel combination of (1) a fast decision tree induction algorithm especially suited to text data and (2) a new method for converting a decision tree to a rule set that is simplified, but still logically equivalent to, the original tree. We rep...

متن کامل

Decision Tree Induction using Adaptive FSA

This paper introduces a new algorithm for the induction of decision trees, based on adaptive techniques. One of the main feature of this algorithm is the application of automata theory to formalize the problem of decision tree induction and the use of a hybrid approach, which integrates both syntactical and statistical strategies. Some experimental results are also presented indicating that the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009